The data, provided by Prosper and made available by Udacity through the Data Analyst program, is data on loans made through the Prosper platform, an online fintech company that offers peer-to-peer financing in which individuals can loan money to individual borrowers for specific purposes, a different means of financing from the traditional loans.
In the dataset, we can examine and explore the characteristics of a Prosper loan (Interest Rate, APR etc.) and a Prosper borrower (Credit Score etc.) that may make for an interesting insight into how the platform operates, finances itself, and is lucrative for investors.
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)
## [1] "/Users/andrewteixeira/Documents/DAND/Final_Project_EDA"
## [1] "/Users/andrewteixeira/Documents/DAND/Final_Project_EDA"
Load the data into the variable “loans”
## [1] "ListingKey"
## [2] "ListingNumber"
## [3] "ListingCreationDate"
## [4] "CreditGrade"
## [5] "Term"
## [6] "LoanStatus"
## [7] "ClosedDate"
## [8] "BorrowerAPR"
## [9] "BorrowerRate"
## [10] "LenderYield"
## [11] "EstimatedEffectiveYield"
## [12] "EstimatedLoss"
## [13] "EstimatedReturn"
## [14] "ProsperRating..numeric."
## [15] "ProsperRating..Alpha."
## [16] "ProsperScore"
## [17] "ListingCategory..numeric."
## [18] "BorrowerState"
## [19] "Occupation"
## [20] "EmploymentStatus"
## [21] "EmploymentStatusDuration"
## [22] "IsBorrowerHomeowner"
## [23] "CurrentlyInGroup"
## [24] "GroupKey"
## [25] "DateCreditPulled"
## [26] "CreditScoreRangeLower"
## [27] "CreditScoreRangeUpper"
## [28] "FirstRecordedCreditLine"
## [29] "CurrentCreditLines"
## [30] "OpenCreditLines"
## [31] "TotalCreditLinespast7years"
## [32] "OpenRevolvingAccounts"
## [33] "OpenRevolvingMonthlyPayment"
## [34] "InquiriesLast6Months"
## [35] "TotalInquiries"
## [36] "CurrentDelinquencies"
## [37] "AmountDelinquent"
## [38] "DelinquenciesLast7Years"
## [39] "PublicRecordsLast10Years"
## [40] "PublicRecordsLast12Months"
## [41] "RevolvingCreditBalance"
## [42] "BankcardUtilization"
## [43] "AvailableBankcardCredit"
## [44] "TotalTrades"
## [45] "TradesNeverDelinquent..percentage."
## [46] "TradesOpenedLast6Months"
## [47] "DebtToIncomeRatio"
## [48] "IncomeRange"
## [49] "IncomeVerifiable"
## [50] "StatedMonthlyIncome"
## [51] "LoanKey"
## [52] "TotalProsperLoans"
## [53] "TotalProsperPaymentsBilled"
## [54] "OnTimeProsperPayments"
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"
## [57] "ProsperPrincipalBorrowed"
## [58] "ProsperPrincipalOutstanding"
## [59] "ScorexChangeAtTimeOfListing"
## [60] "LoanCurrentDaysDelinquent"
## [61] "LoanFirstDefaultedCycleNumber"
## [62] "LoanMonthsSinceOrigination"
## [63] "LoanNumber"
## [64] "LoanOriginalAmount"
## [65] "LoanOriginationDate"
## [66] "LoanOriginationQuarter"
## [67] "MemberKey"
## [68] "MonthlyLoanPayment"
## [69] "LP_CustomerPayments"
## [70] "LP_CustomerPrincipalPayments"
## [71] "LP_InterestandFees"
## [72] "LP_ServiceFees"
## [73] "LP_CollectionFees"
## [74] "LP_GrossPrincipalLoss"
## [75] "LP_NetPrincipalLoss"
## [76] "LP_NonPrincipalRecoverypayments"
## [77] "PercentFunded"
## [78] "Recommendations"
## [79] "InvestmentFromFriendsCount"
## [80] "InvestmentFromFriendsAmount"
## [81] "Investors"
## ListingKey ListingNumber ListingCreationDate
## 1 1021339766868145413AB3B 193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1 1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A 81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A 658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2 909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D 1074836 2013-12-14 08:26:37.093000000
## CreditGrade Term LoanStatus ClosedDate BorrowerAPR BorrowerRate
## 1 C 36 Completed 2009-08-14 00:00:00 0.16516 0.1580
## 2 36 Current 0.12016 0.0920
## 3 HR 36 Completed 2009-12-17 00:00:00 0.28269 0.2750
## 4 36 Current 0.12528 0.0974
## 5 36 Current 0.24614 0.2085
## 6 60 Current 0.15425 0.1314
## LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1 0.1380 NA NA NA
## 2 0.0820 0.07960 0.0249 0.05470
## 3 0.2400 NA NA NA
## 4 0.0874 0.08490 0.0249 0.06000
## 5 0.1985 0.18316 0.0925 0.09066
## 6 0.1214 0.11567 0.0449 0.07077
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1 NA NA
## 2 6 A 7
## 3 NA NA
## 4 6 A 9
## 5 3 D 4
## 6 5 B 10
## ListingCategory..numeric. BorrowerState Occupation EmploymentStatus
## 1 0 CO Other Self-employed
## 2 2 CO Professional Employed
## 3 0 GA Other Not available
## 4 16 GA Skilled Labor Employed
## 5 2 MN Executive Employed
## 6 1 NM Professional Employed
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1 2 True True
## 2 44 False False
## 3 NA False True
## 4 113 True False
## 5 44 True False
## 6 82 True False
## GroupKey DateCreditPulled
## 1 2007-08-26 18:41:46.780000000
## 2 2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4 2012-10-22 11:02:32
## 5 2013-09-14 18:38:44
## 6 2013-12-14 08:26:40
## CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1 640 659 2001-10-11 00:00:00
## 2 680 699 1996-03-18 00:00:00
## 3 480 499 2002-07-27 00:00:00
## 4 800 819 1983-02-28 00:00:00
## 5 680 699 2004-02-20 00:00:00
## 6 740 759 1973-03-01 00:00:00
## CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1 5 4 12
## 2 14 14 29
## 3 NA NA 3
## 4 5 5 29
## 5 19 19 49
## 6 21 17 49
## OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1 1 24 3
## 2 13 389 3
## 3 0 0 0
## 4 7 115 0
## 5 6 220 1
## 6 13 1410 0
## TotalInquiries CurrentDelinquencies AmountDelinquent
## 1 3 2 472
## 2 5 0 0
## 3 1 1 NA
## 4 1 4 10056
## 5 9 0 0
## 6 2 0 0
## DelinquenciesLast7Years PublicRecordsLast10Years
## 1 4 0
## 2 0 1
## 3 0 0
## 4 14 0
## 5 0 0
## 6 0 0
## PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1 0 0 0.00
## 2 0 3989 0.21
## 3 NA NA NA
## 4 0 1444 0.04
## 5 0 6193 0.81
## 6 0 62999 0.39
## AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1 1500 11 0.81
## 2 10266 29 1.00
## 3 NA NA NA
## 4 30754 26 0.76
## 5 695 39 0.95
## 6 86509 47 1.00
## TradesOpenedLast6Months DebtToIncomeRatio IncomeRange
## 1 0 0.17 $25,000-49,999
## 2 2 0.18 $50,000-74,999
## 3 NA 0.06 Not displayed
## 4 0 0.15 $25,000-49,999
## 5 2 0.26 $100,000+
## 6 0 0.36 $100,000+
## IncomeVerifiable StatedMonthlyIncome LoanKey
## 1 True 3083.333 E33A3400205839220442E84
## 2 True 6125.000 9E3B37071505919926B1D82
## 3 True 2083.333 6954337960046817851BCB2
## 4 True 2875.000 A0393664465886295619C51
## 5 True 9583.333 A180369302188889200689E
## 6 True 8333.333 C3D63702273952547E79520
## TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 1 11 11
## 6 NA NA NA
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 0 0
## 6 NA NA
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 11000 9947.9
## 6 NA NA
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1 NA 0
## 2 NA 0
## 3 NA 0
## 4 NA 0
## 5 NA 0
## 6 NA 0
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1 NA 78 19141
## 2 NA 0 134815
## 3 NA 86 6466
## 4 NA 16 77296
## 5 NA 6 102670
## 6 NA 3 123257
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1 9425 2007-09-12 00:00:00 Q3 2007
## 2 10000 2014-03-03 00:00:00 Q1 2014
## 3 3001 2007-01-17 00:00:00 Q1 2007
## 4 10000 2012-11-01 00:00:00 Q4 2012
## 5 15000 2013-09-20 00:00:00 Q3 2013
## 6 15000 2013-12-24 00:00:00 Q4 2013
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA 330.43 11396.14
## 2 1D13370546739025387B2F4 318.93 0.00
## 3 5F7033715035555618FA612 123.32 4186.63
## 4 9ADE356069835475068C6D2 321.45 5143.20
## 5 36CE356043264555721F06C 563.97 2819.85
## 6 874A3701157341738DE458F 342.37 679.34
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1 9425.00 1971.14 -133.18
## 2 0.00 0.00 0.00
## 3 3001.00 1185.63 -24.20
## 4 4091.09 1052.11 -108.01
## 5 1563.22 1256.63 -60.27
## 6 351.89 327.45 -25.33
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1 0 1 0
## 2 0 1 0
## 3 0 1 0
## 4 0 1 0
## 5 0 1 0
## 6 0 1 0
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1 0 0 258
## 2 0 0 1
## 3 0 0 41
## 4 0 0 158
## 5 0 0 20
## 6 0 0 1
We will be looking at a few variables that relate to both the loan itself (dollar amount, term) and the borrower (credit score, prosper rating) in order to determine what is likely to influence the total amount Prosper is willing to lend a borrower and at what rate.
We will be examining the LoanOriginalAmount data from the loans dataset to better understand the frequency of loan values provided by Prosper.
##
## 4000 15000 10000 5000 2000 3000 25000 20000 1000 2500 7500 7000
## 14333 12407 11106 6990 6067 5749 3630 3291 3206 2992 2975 2949
## 6000 3500 8000 12000 9000 13000 1500 4500
## 2869 2567 2442 1921 1695 1509 1507 1406
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The table above shows that loans of 4,000 are the most provided by Propser. This value is below the median of 6500 and the Mean of 8337, which suggests there are some large loans that are pulling the mean forward.
The max loan provided by Prosper was 35,000.
The histogram above shows a spike at around 3,000- 4,000. This suggests there are a great number of loans provided in this area. Let’s examine further.
##
## 4000 15000 10000 5000 2000 3000 25000 20000 1000 2500 7500 7000
## 14333 12407 11106 6990 6067 5749 3630 3291 3206 2992 2975 2949
## 6000 3500 8000 12000 9000 13000 1500 4500
## 2869 2567 2442 1921 1695 1509 1507 1406
Let’s look at the loan term provided by Prosper to get a sense of what types of loans and duration of loans Prosper gives to their borrowers.
##
## 12 60 36
## 1614 24545 87778
##
## 12 36 60
## 1614 87778 24545
From the table and histogram above, we see that Prosper only gives loans in 12, 36, and 60 month terms. The majority of loans are provided on a 36 month term payment plan.
We will examine the amount of money that is delinquent per borrower per Prosper loan.
##
## 80 35 70 200 10 20 130 67 150 15 55 65 120 40 60 75 30 25
## 28 29 29 29 30 30 31 32 32 34 34 34 35 43 46 56 65 66
## 100 50
## 67 73
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 0.0 0.0 984.5 0.0 463881.0 7622
##
## 0 50 100 25 30 75 60 40 120 15 55 65
## 89818 73 67 66 65 56 46 43 35 34 34 34
## 67 150 130 10 20 35 70 200 80 33 45 54
## 32 32 31 30 30 29 29 29 28 27 26 26
## 79 110 300 36 51 57 68 88 90 105 134 32
## 26 26 26 25 25 25 25 25 25 25 25 24
## 48 52 84 230 53 78 101 115 158 350 31 37
## 24 24 24 24 23 23 23 23 23 23 22 22
## 42 64
## 22 22
The data above in the histograms and table shows that the majority of Prosper borrowers have 0 dollars in delinquency, a good sign if you are a lender.
Examining the Delinquencies Last 7 Years variable will help us understand the type of borrowers that are included in this dataset. DelinquenciesLast7Years may provide a glimpse into the reliability of borrowers in this set.
##
## 0 1 3 2 4 5 6 7 8 9 10 11
## 76439 3967 3183 2879 2592 1826 1790 1648 1421 1208 1151 1075
## 12 13 14 15 16 17 18 20
## 982 873 821 795 731 608 574 565
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 0.000 4.155 3.000 99.000 990
Findings : Unsurprisingly, we see the vast majority of delinquent loans in the last 7 years for borrowers at 0. The median reflects this by also being 0, however, a mean of 4.155 shows that there are some potentially large amounts of delinquencies in the last 7 years per borrowers that may be driving this value up.
In the table, we see that there are, in fact, 110 borrowers who have 99 delinquencies in the past 7 years. This appears to be a strange lift from 98 (3 total borrowers). Is this a maximum value? Could it be human error?
Findings: A simple histogram shows that, in fact, the majority of delinquencies in last 7 years for borrowers were negatively skewed, unsurprisingly considering that banks and lenders are in the business of ensuring their loans are paid back. Let’s examine and plot that surreptitious the DelinquenciesLast7Years on the larger end of the plot.
Findings The histogram from DelinquenciesLast7Years > 79 show that the #99 is a strange occurance that may be artifically driving the mean up for the total DelinquenciesLast7Years. We will attempt to determine, later in this analysis, what makes an individual become delinquent with some multivariate tests and plots.
We are getting a better sense of the types of borrowers Prosper loans money to. Now, we will examine the BorrowerAPR to determine how Prosper makes $ of their customers.
The findings above dictate a few peaks: Once between .15 and .20, again at .3, and once more at around .36.
Let’s create a sorted table to examine these trends.
##
## 0.35797 0.35643 0.37453 0.30532 0.2951 0.35356 0.29776 0.15833 0.24246
## 3672 1644 1260 902 747 721 707 652 605
## 0.24758 0.12528 0.17359 0.15324 0.27462 0.27285 0.13799 0.15713 0.12691
## 601 559 549 547 534 506 489 482 456
## 0.25781 0.20735
## 444 433
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229 25
Interestingly, we see that Prosper’s most commonly issued APR is .3672, this is well above the mean of .21976 and the median of .20976.
Let’s remove all NA’s from APR for analysis - adding the Mean line, we can examine how many values are falling above or below the mean.
## [1] 0.2188277
We see a number of peaks and valleys within the BorrowerAPR data, which suggests there are certain integers that Prosper commonly issues when it comes to BorrowerAPR.
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 0 806
Findings Overall, the income of borrowers is failing between 25,000 and 74,999.
##
## CA TX NY FL IL GA OH MI VA NJ NC
## 14717 6842 6729 6720 5921 5515 5008 4197 3593 3278 3097 3084
## WA PA MD MO MN MA CO IN AZ WI OR TN
## 3048 2972 2821 2615 2318 2242 2210 2078 1901 1842 1817 1737
## AL CT SC NV KS KY OK LA UT AR MS NE
## 1679 1627 1122 1090 1062 983 971 954 877 855 787 674
## ID NH NM RI HI WV DC MT DE VT AK SD
## 599 551 472 435 409 391 382 330 300 207 200 189
## IA WY ME ND
## 186 150 101 52
From the table above, we can see that California has the largeset amount of users on Prosper with 14,717 borrowers, while North Dakota has the least with 52 borrowers.
##
## 2 3 1 4 5 0 6 7 8 9 10 11
## 14887 13934 13785 12148 10098 8430 7607 6171 4692 3779 2914 2431
## 12 13 14 15 16 17 18 19
## 1786 1453 1245 978 864 724 581 539
The data above show that most borrowers have very few inquiries on their credit, potentially showing that Prosper selects those who are steadfast and committed to paying down their debts. Let’s create a plot in which the number 0 is removed, showing only borrowers with 0 inquiries
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 4.000 6.036 7.000 379.000
From the plot above, we see a steep decline in total inquiries values after the initial 1,2 or 3 total inquiries.
Given that the 10 value appears to be an anomaly, let’s run a summary on Debt to Income Ratio
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
Let’s further examine Debt to Income Ratios that are above the 3rd Quartile.
##
## 0.33 0.35 0.34 0.36 0.37 0.38 0.39 0.4 0.41
## 1902 1812 1810 1523 1399 1364 1226 1109 995
## 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5
## 893 851 753 697 634 617 524 470 447
## 0.51 0.52 0.54 0.53 0.55 10.01 0.56 0.57 0.59
## 431 372 341 320 279 272 244 229 182
## 0.58 0.6 0.61 0.62 0.63 0.65 0.64 0.66 0.67
## 166 149 140 120 108 105 91 78 77
## 0.68 0.71 0.7 0.69 0.72 0.75 0.73 0.76 0.74
## 72 67 64 58 58 51 43 43 40
## 0.78 0.77 0.86 0.87 0.81 0.84 0.85 0.79 0.83
## 38 35 29 29 27 27 27 26 26
## 0.8 0.82 0.88 0.89 0.9 0.91 0.94 0.96 0.97
## 24 22 22 22 20 19 19 19 18
## 0.92 0.93 1 0.98 0.99 1.02 0.95 1.01 1.05
## 15 14 14 13 13 12 11 11 11
## 1.06 1.07 1.03 1.19 1.21 1.04 1.12 1.16 1.08
## 11 11 10 10 10 8 8 8 7
## 1.1 1.27 1.14 1.15 1.18 1.22 1.49 1.2 1.25
## 7 7 6 6 6 6 6 5 5
## 1.31 1.56 1.09 1.11 1.24 1.32 1.36 1.47 1.54
## 5 5 4 4 4 4 4 4 4
## 2.2 1.13 1.29 1.34 1.39 1.41 1.42 1.45 1.46
## 4 3 3 3 3 3 3 3 3
## 1.6 1.75 1.79 1.81 1.87 1.89 1.99 2.3 2.38
## 3 3 3 3 3 3 3 3 3
## 2.45 2.72 3.49 3.92 1.17 1.23 1.26 1.28 1.3
## 3 3 3 3 2 2 2 2 2
## 1.37 1.38 1.5 1.51 1.52 1.55 1.57 1.58 1.61
## 2 2 2 2 2 2 2 2 2
## 1.65 1.72 1.74 1.8 1.82 1.93 1.94 1.98 2.02
## 2 2 2 2 2 2 2 2 2
## 2.07 2.1 2.19 2.34 2.5 2.6 2.65 2.79 2.95
## 2 2 2 2 2 2 2 2 2
## 2.99 3.23 3.27 3.53 3.64 3.73 3.81 4.4 4.43
## 2 2 2 2 2 2 2 2 2
## 4.71 4.9 5.36 5.56 7.27 0.32007 0.321 0.32388 0.33348
## 2 2 2 2 2 1 1 1 1
## 0.34443 0.34452 0.35155 0.35333 0.3543 0.3555 0.35972 0.36006 0.36232
## 1 1 1 1 1 1 1 1 1
## 0.36331 0.37002 0.37772 0.38361 0.3838 0.38587 0.39181 0.39531 0.39846
## 1 1 1 1 1 1 1 1 1
## 0.40466 0.40475 0.40572 0.4137 0.41586 0.42965 0.42972 0.43886 0.4413
## 1 1 1 1 1 1 1 1 1
## 0.46403 0.499 0.50136 0.50872 0.51504 0.54433 0.56576 0.57091 0.60618
## 1 1 1 1 1 1 1 1 1
## 0.78342 1.35 1.4 1.43 1.44 1.53 1.59 1.62 1.64
## 1 1 1 1 1 1 1 1 1
## 1.66 1.67 1.68 1.69 1.7 1.73 1.76 1.78 1.85
## 1 1 1 1 1 1 1 1 1
## 1.86 1.9 1.92 1.95 1.96 2.05 2.08 2.09 2.11
## 1 1 1 1 1 1 1 1 1
## 2.14 2.16 2.18 2.21 2.22 2.23 2.25 2.27329 2.28
## 1 1 1 1 1 1 1 1 1
## 2.31 2.35 2.43 2.47 2.48 2.49 2.53 2.55 2.57
## 1 1 1 1 1 1 1 1 1
## 2.59 2.63 2.66 2.67 2.7 2.74 2.83 2.86 2.94
## 1 1 1 1 1 1 1 1 1
## 2.97 3.07 3.09 3.11 3.14 3.17 3.19 3.22 3.26
## 1 1 1 1 1 1 1 1 1
## 3.28 3.29 3.32 3.38 3.39 3.45 3.54 3.58 3.59
## 1 1 1 1 1 1 1 1 1
## 3.66 3.68 3.71 3.76 3.77 3.86 4 4.03 4.04
## 1 1 1 1 1 1 1 1 1
## 4.13 4.14 4.15 4.16 4.19 4.21 4.27 4.29 4.32
## 1 1 1 1 1 1 1 1 1
## 4.33 4.37 4.42 4.44 4.54 4.58 4.66 4.68 4.75
## 1 1 1 1 1 1 1 1 1
## 4.76 4.78 4.84 4.85 4.86 4.89 5.02 5.05 5.06
## 1 1 1 1 1 1 1 1 1
## 5.1 5.15 5.16 5.18 5.21 5.23 5.26 5.29 5.31
## 1 1 1 1 1 1 1 1 1
## 5.34 5.38 5.55 5.5608 5.59 5.64 5.65 5.67 5.69
## 1 1 1 1 1 1 1 1 1
## 5.76 5.98 6.01 6.05 6.06 6.07 6.2 6.37 6.49
## 1 1 1 1 1 1 1 1 1
## 6.51 6.66 6.68 6.85 6.95 6.96 7.06 7.22 7.42
## 1 1 1 1 1 1 1 1 1
## 7.47 7.52 7.59 7.79 7.9 7.99 8.06 8.13 8.27
## 1 1 1 1 1 1 1 1 1
## 8.52 8.53 8.63 9.06 9.2 9.44 9.77
## 1 1 1 1 1 1 1
Findings From the data above, that there are a few loans at various price points that have obscenely high debt to income ratios. Perhaps these folks have a great deal of collateral to put up. Let’s look at all the DTI ratios under or equal to 1.
From the chart above, we see that the majority of Debt-to-Income ratios are falling under .5, no surprise as the 3rd quartile value is .32.
The graph above shows that most borrowers are “employed” or categorized as “full-time”
Let’s examine the credit scores of applicant’s by creating a new variable that takes the mean of credit score upper and credit score lower. We will, when creating a histogram, start the x-axis at 300 to avoid lower values that are outliers or have omitted values.
From the above, we can see that most credit scores fall within 650 and 700. Let’s bucket those credit scores to create a categorical variable for later analysis.
When we look at the bucket bar chart below, we see there are a nearly equal amount of loans provided to average credit scores of between 600-700 and 700-800, with very few happening below or above.
The structure of the dataset includes 113,937 observations of 81 variables, meaning there are 113,937 individual loans oberved. We have created two new variables to better examine credit score. avgCreditScore takes the average of the upper and lower credit score to take the mean for each individual borrower. The new variable CSBucket is a categorical variable that will help determine how credit may influence loans.
The main features of interest for me in the dataset include: Prosper Score: What factors contribute most, or have the highest correlation, to establshing a Prosper rating.
Also, is the LoanOriginalAmount for highly related to Income Range and credit score?
I am also interested in what levels of credit scores influence the eventual borrower APR. Are there distinct patterns between the credit worthiness of a borrower and whether they get a loan?
Yes, I created the avgCreditScore variable to determine what number lies between the lower and upper credit score for each user. Then, I used the cut() function in order to bin the average credit scores per user.
of the data? If so, why did you do this?
I had removed na’s and blank values from a number of individual datapoints in order to better run analysis such as bar graphs.
##
## 1 2 3 4 5 6 7
## 6935 9795 14274 18345 15581 14551 5372
From the box plot above, we see that loans with Prosper Ratings of A, AA, and BB have much higher quartiles, outliers, and loan original amounts, which makes sense given that Prosper Ratings of C, D, E, HR are of lesser quality according to the Prosper website.
Let’s limit the ylim to further examine the median of the loan categories.
From the zoomed in data, we can see that AA loans have the highest median of any category at slightly above 10,000. Let’s run a table to see what the median # actually is.
## loans$ProsperRating..Alpha.:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2500 4500 6159 7904 25000
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 5850 10000 11460 15000 35000
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 6000 10940 11584 16000 35000
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 6000 10000 11622 15000 35000
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 5000 10000 10392 15000 25000
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6100 7083 10000 15000
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3600 4000 4586 5000 15900
## --------------------------------------------------------
## loans$ProsperRating..Alpha.: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3000 4000 3463 4000 16800
The table above shows AA ratings with the median of 10,940, the highest of any category.
We’ve examined the loan original amount from the Prosper Rating, let’s see if borrower delinquencies have any effect on the loan original amount.
The plot above shows some obvious levels to the data, as Propser seems to provide loans in 5,000 increments, however, we do see a slight pattern in which, when Delinquencies in the last 7 years rise, the total amount of a loan often falls.
Let’s add transparency and jitter in order to account for overplotting.
The above chart, with it’s dark line on the y-axis at 0, shows that Prosper very much likes to provide loans to those with little to no delinquencies in the past 7 years. There are a few points at 99 in the chart, which may be an anomoly or human error, and is worth further questioning of Prosper’s data capture.
Building on the picture we are beginning to see with regards to loan original amount, let’s examine Employment status and loan original amount.
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
In this data, I will create a new DF called loans_by_employmentstats as there are 2255 borrowers in which the status is blank, you can see this in the table above.
The boxplot above shows that employed Prosper borrowers, either through a company or self-employed, have higher median loan amounts than other categories, including Full-time and Part-time. This is odd, as employed persons are logically either employed full-time or part-time or self-employed. This is a question for Prosper as to what the classification methodology is.
By placing the overall median line on the graph, we see that only the categories of “Employed” and “self-employed” actually have a higher median amount than the overall median, with the other categories falling behind.
We’ve got a sense of the borrowers, now let’s take a look at the loan itself by examining the loan original amount and its estimated return.
The graph above, with slight transparency and jitter, show that the estimated return for most loans is falling around .05 and 1.5. Interestingly, there are a fair amount of loans that have negative returns, much more than those that have estimated returns of .2 or more.
A Pearson correlation test below shows a negative correlation of -.28. I would have predicted greater correlation, however, the negative number suggests as the loan original amount increases, the estimated return decreases, a common phenomenmon.
##
## Pearson's product-moment correlation
##
## data: loans$EstimatedReturn and loans$LoanOriginalAmount
## t = -86.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2922833 -0.2799279
## sample estimates:
## cor
## -0.2861175
## [1] 80.47523
By taking the mean amount of investors per loan, which is calculated to around 80, we can begin to see how many people it actually takes to get a Prosper loan fully-funed. A scatterplot can visualize this phenomeon.
The scatter plot above shows that, as loan amounts grow, so do the total amount of investors needed. The mean of just above 80 shows that the majority of loans are falling at a similar price level, thus a similar amount of investors are needed for analogous loans.
A correlation below shows a Pearson score of .38, lesser than what I expected.
##
## Pearson's product-moment correlation
##
## data: loans$Investors and loans$LoanOriginalAmount
## t = 138.71, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3751140 0.3850494
## sample estimates:
## cor
## 0.3800926
We will, finally, use our newly binned data to examine loan original amount by credit score bucket. This will provide a nice picture as to which credit score buckets have the highest median Loan Original amount.
Unsurprsiingly, the buckets of 700-800 and 800-900 both have medians that exceed the overall median. It is interesting to note that 600-700 credit scores have a median Loan Original Amount less than the overall median, an indication credit scores in this area are not deemed as worthy for a loan.
Most interestingly, I discovered that a credit score of 600-700 has a substantially lower median loan original amount than credit scores of 700-800. This was surprising to me, as I suspected a slight decrease but, personally, have always considered a credit score of 600-700 to be somewhat trustworthy.
Also, the total amount of investors per loan was quite intriguing, as there were substantial outliers from the mean of 80 per loan. We will further examine this by exploring categories that may influence that in the multi-variate analysis portion.
Yes, the relationship between employment status was particularly interesting, as “full time” employees have a signficiantly lower median loan amount than those simply categorized as “employed”. I suspect there may be an error in Prosper’s classification system.
The relationship between total investors and dollar amount had a Pearson score of .38, the strongest relationship I uncovered.
Interestingly, loan original amount and estimated return were negatively correlated, showing that as a loan amount is greater, estimated returns are generally expected to fail.
The above scatterplot shows that csBucket 800-900 are largely consolidated at a little less than .1 for their Borrower APR, despite loan amount. This shows that Prosper generally loans funds to those with good credit at various levels of dollar amount.
You can see distinct, almost vertical patterns of increased BorrowerAPR based on their credit score. Lower credit scores from 400-600 are largely consolidated lower on the y-axis at varying APRs, showing that credit scores that are lesser do not get loaned as large amounts as other credit scores.
Similarly to our analysis on borrower apr, estimated returns are largely attributed to certain credit scores. The highest end of credit scores (800-900), have a distinct Estimated Return of around .05 percent, with 700-800 categories at around .075 to .1, and 600-700 following thereafter with growing estimated returns. This could be because those with higher credit scores pay their loans off quicker, at lower rates.
Interestingly, there are a bevy of loans at the 600-700 credit score level that do not pay off loans that are less than 10,000 in total value. Even though they have credit scores similarly to those borrowing at higher dollar amounts, they were unable to pay their comparable smaller debts.
The graph above shows that the total amount of investors increases as the loan original amount increases, and invariably, the higher credit score users are more adept at getting additional investors. In the scatterplot, you hardly see any 400-700 credit score borrowers beyond the overall median line.
Interestingly, the higher end credit score borrowers of 800-900 generally have a # of investors per loan that is higher than the median and higher than there lesser counterparts, even twhen the loan original amount is the same, this means more individuals are investing smaller amounts per individual loan.
We run the tables below to view which Prosper Rating has the most loans and the highest loan amounts, we find that a score of 7 receives the highest median loan and a score of 1-2 have the lowest medians. This may be an indication of the loan worthiness of the borrowers - thus, we assue 7 to be the best loan for chance of repayment and 1 being the worse.
##
## 1 2 3 4 5 6 7
## 6935 9795 14274 18345 15581 14551 5372
## loans$ProsperRating..numeric.: 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3000 4000 3463 4000 16800
## --------------------------------------------------------
## loans$ProsperRating..numeric.: 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3600 4000 4586 5000 15900
## --------------------------------------------------------
## loans$ProsperRating..numeric.: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6100 7083 10000 15000
## --------------------------------------------------------
## loans$ProsperRating..numeric.: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 5000 10000 10392 15000 25000
## --------------------------------------------------------
## loans$ProsperRating..numeric.: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 6000 10000 11622 15000 35000
## --------------------------------------------------------
## loans$ProsperRating..numeric.: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 5850 10000 11460 15000 35000
## --------------------------------------------------------
## loans$ProsperRating..numeric.: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 6000 10940 11584 16000 35000
The graph above shows that, as the number of investors and the loan original amount increases, more high-quality loans of Prosper Ratings of 4-5-6-7 are prevalent.
When looking at the loan original amount and delinquencies the last 7 years, it is no surprise that the majority of points on the graph are those that are employed, even when transparency is added for overplotting. With the “employed” category as “red”, you will notice the prevalence of red dots as the loan amount increases, and the lack of red dots as the delinquencies the last 7 years increase below $5,000 loan amounts.
This means that those who are not categorized as “employed” in the Prosper system have higher delinquencies in the past 7 years, and do not get the same level of loan as those who are employed.
In this section, we wanted to look closer at what categories may be determining loan amounts, the amount of investors, and the delinquencies in the last 7 years. We noticed that those who were not categorized as “employed” saw themselves receiving loans of roughly less than 5,000. Those same individuals were likely to have a higher rate of delinquencies in the last 7 years than those categorized as “employed”, and their loan amounts were lesser than their employed counterparts even if the delinquencies in the last 7 years were the same.
We also discovered that Prospers numeric rating runs from 1-7, with 7 being the most trustworthy rating.
I was overall surprised to see that the number of investors increases as credit score and prosper rating increases. It was my inclination that if a loan were to be deemed safe or less risky, more investors would provide higher individual amounts towards that loan, thus, lowering the total amount of investors per loan.
We did not create any models with this dataset.
The plot above shows the Loan Original Amount on the y-axis, the number of delinquencies in the last 7 years on the x-axis, with each plot indicating an employment status as a categorical variable.
This graph shows to me that, should you be employed, you have a marginal chance of receiving a loan from Prosper. Though the large majority of loans are at 0 delinquencies, it is important to notice that there are a great number of loans provided to people who have be delinquent dozens of times. You are more likely to receive a greater loan amount, however, if your employment status is “employed”
The graph above shows the number of investors per loan on the x-axis, the total loan original amount on the y-axis, with the categorical variable of credit score bucket. This plot shows that, as the loan amount increases, the number of investors increases as well.
We already know from prior graphs that higher loan amounts are more likely to be people employed with no delinquencies on their account, couple that with higher credit scores and higher loan amounts are generally more safe. We see that, as loan amounts increase, the number of investors per loan also increases.
This, to me, is a strange phenomenmon, as I would have guessed that, if a loan were to be less risky, investors would invest greater amounts towards that loan, thus limiting the total amount of investors per loan.
The plot shows borrowers with credit scores between 700-900 are generally receiving more investors at higher loan amounts than other credit scores.
The above plot shows the estimated return per loan on the x-axis, with the loan original amount on the y-axis and the credit score bucket as the categorical variable.
This plot shows that estimated returns will generally fall between .05 and .15, are nearly normally distributed, and will be somewhat dependent on the credit score of the borrower. Borrowers with higher credit scores are shown to provide less returns to investors, likely due to lower interest rates and ability to pay back a loan quickly.
The plot also shows that estimated returns for higher credit score borrowers are largely the same regardless of the loan amount, showing that loan amount does not really affect return rate, and vice versa. ——
The Prosper data was quite large and there were a number of variables that were omitted from this analysis simply due to time and expectations from the course. With greater time, one could examine the areas such as investment from friends to see if certain credit score users or certain Prosper loans are more likely to get loans from their friends.
It was interesting to note that there are a higher number of investors per loan, particularly when the credit score of the borrower is high. This goes against my intution that single or a handful of investors would want to buy out a loan completely. This potentially may show some skepticism of the Prosper platform.
I did not conduct any analysis of loan variables over time, as my questions were more focused on what affects a loan amount and the amount of investors investing in said loan. It may be interesting to explore whether estimated returns lowered or expanded overtime for Prosper loans.